From free shallow monolingual resources to machine translation systems: easing the task
نویسندگان
چکیده
The availability of machine-readable bilingual linguistic resources is crucial not only for machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources demands extensive manual work. This paper describes a methodology to build automatically bilingual dictionaries and transfer rules by extracting knowledge from word-aligned parallel corpora processed with free shallow monolingual resources (morphological analysers and part-of-speech taggers). Experiments for Brazilian Portuguese– Spanish and Brazilian Portuguese– English parallel texts have shown promising results.
منابع مشابه
From free shallow monolingual resources to machine translation systems
The availability of machine-readable bilingual linguistic resources is crucial not only for machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources demands extensive manual work. This paper describes a methodology to build automatically bilingual dictionaries and transfer rules by extracting knowledge from word-ali...
متن کاملExpanding Parallel Resources for Medium-Density Languages for Free
We discuss a previously proposed method for augmenting parallel corpora of limited size for the purposes of machine translation through monolingual paraphrasing of the source language. We develop a three-stage shallow paraphrasing procedure to be applied to the Swedish-Bulgarian language pair for which limited parallel resources exist. The source language exhibits specifics not typical of high-...
متن کاملSharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium
In this paper, we describe two methods developed for sharing linguistic data between two free and open source rule based machine translation systems: Apertium, a shallow-transfer system; and Grammatical Framework (GF), which performs a deeper syntactic transfer. In the first method, we describe the conversion of lexical data from Apertium to GF, while in the second one we automatically extract ...
متن کاملExperiments with Term Translation
In this article we investigate the translation of financial terms from English into German in the isolation of an ontology vocabulary. For this study we automatically built new domain-specific resources from the translation search engine Linguee and from the online encyclopaedia Wikipedia. Due to the fact that we performed the translation approach on a monolingual ontology, we ran several sub-e...
متن کاملAbu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling
This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish–English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish language by (i) crawling parallel and monolingual data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Several stati...
متن کامل